Entry Name:  "CMICH-Zhang-MC3"

VAST 2013 Challenge
Mini-Challenge 3: Visual Analytics for Network Situation Awareness

 

 

Team Members:

Tao Zhang, Department of Computer Science, Central Michigan University, zhang3t@cmich.edu     PRIMARY (Point of contact for questions/answers)
Qi Liao, Department of Computer Science, Central Michigan University, liao1q@cmich.edu

Lei Shi, State Key Laboratory of Computer Science, Chinese Academy of Sciences, shil@ios.ac.cn

 

Student Team:  Yes

 

Analytic Tools Used:

Tableau

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2013 is complete? Yes

 

Video:

http://cps.cmich.edu/liao1q/video/VC13MC3.wmv

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC3.1 – Provide a timeline (i.e., events organized in chronological order) of the notable events that occur in Big Marketing’s computer networks for the two weeks of supplied data. Use all data at your disposal to identify up to twelve events and describe them to the extent possible.  Your answer should be no more than 1000 words long and may contain up to twelve images.

 

Event #1: Unidentified work stations connection problem

Start: 4/1/2013 8:57:54 AM

End: 4/7/2013 8:59:56 AM

Description: Nearly a hundred unidentified work stations were recorded and reported ‘no status’ message in the Big Brother dataset in the first week. Hostnames from ‘WSS2-101.BIGMKT2.COM’ to ‘WSS2-200.BIGMKT2.COM’ are not listed in data description file ‘BigMktNetwork.txt’. And all of them did not have any record information after the ‘Big Brother data gap’. Besides, unlike the server connection status, it seems all workstations were suffering from a connection problem during the 2 weeks. A large quantity of ‘problem’ status-sending looks like a nonscheduled situation among three subnets.

Image:

Description: 4

Figure #1:The Gantt Chart about ‘conn’ status information from all work stations. Plots are separated by subnets.  Status 1 (good) has been filtered while status 2 (warning), 3 (problem), and 4 (hostname did not send status information) are shown.

 

 

Event #2: Machine disk problem

Start: 4/1/2013 8:40:18 AM

End: 4/15/2013 9:30:09 AM

Description: Three machines’ disk usages were in a suspicious situation. Workstation ‘administrator’ and server ‘web01’ kept sending ‘warning’ status message from the start of the entire Big Brother dataset. They both stepped into a worse situation nearly the event #6 happened. After the ‘gap’, workstation ‘administrator’ had its’ disk status back to normal (1-good). But Server ‘web01’ kept the ‘problem’ status to the end. Those two machines and server ‘web03’ have a significant difference from all server machines in terms of both the disk usage level and disk status.

Image:

Description: 3

Figure #2: Using Gantt Chart plotting all servers’ disk status with a timeline from the entire Big brother dataset. Color shows different Status Value from 1 to 4. The lower Circle Chart using the result of combining all server disk information.  Color shows details about average of Status Value.  Sizes represent the average of Disk Usage.

 

 

Event #3: CPU warning

Start: [1] 4/1/2013 8:44:55 AM, [2] 4/10/2013 6:57:46 AM, [3] 4/11/2013 6:54:56 AM, [4] 4/12/2013 11:59:16 PM

End: [1] 4/1/2013 9:14:47 AM, [2] 4/10/2013 8:06:41 AM, [3] 4/11/2013 7:05:39 AM, [4] 4/13/2013 12:33:50 AM

Description: Servers reported ‘warning’ status for CPU 4 times together. The first time the situation started around all mail servers and web servers in Enterprise site 2. Then, all DC servers are included. For the last 2 times, almost all servers reported CPU status by sending a ‘warning’ message.

Image:

Description: 2

Figure #3: Using Gantt Chart plotting all servers’ CPU status from the entire Big Brother dataset. Color shows details about ‘StatusValue’ as an attribute. Using ‘StatusValue’ filter to eliminate all records with good (1) CPU status and only show the abnormal 2 and 4.

 

 

Event #4: Server spike in Network flow data: a few servers’ payloads are significantly higher

Start: [1] April 1, 2013 10 PM, [2] April 3, 2013 7 AM, [3] April 11, 2013 10 AM, [4] April 14, 2013 12 PM

End: [1] April 2, 2013 7 AM, [2] April 3, 2013 1 PM, [3] April 11, 2013 1 PM, [4] April 14, 2013 4 PM

Description: In the server’s network flow dataset, there are 4 spikes for ‘Connection Count’, ‘Total Bytes’, and ‘Packet Count’ rising up at the same times. With the further analytic, we found out the reasons for the spike are about a few target servers’ significantly higher level of payload.

Image:

Description: 6_1

Figure #4: Using servers’ destination aggregation as an example. The area plots of sum of the connection count, sum of the ‘FirstSeenDestPacketCount,’ and sum of the ‘FirstSeenDestTotalBytes’ for ‘StartTime’ hour.

 

Description: 6

Figure #5: Tree maps show the differences between the spike day (left) and normal day (right). In the tree maps, color shows sum of the ‘FirstSeenDestPacketCount’.  Size shows sum of the connection count.  Obviously, on spike day April 11, server ‘web03’,’web02’, ‘web01’ and ‘web02l’ dominate the situation.  On regular day April 13, most of the servers’ payloads are in the nearly same level.

 

 

Event #5: Problem server ‘web03’

Start: 4/3/2013 12:49:31 PM

End: 4/5/2013 8:25:53 AM

Description: During the event period, server ‘web03’ did not send status information (by reporting status value 4) for almost all types of Big Brother messages except the type ‘conn’ in which, in the same session, it reported ‘problem’ instead. In addition, some the server’s reporting frequencies were around 30 minutes which are much lower than other machine’s 5 minutes.

Image:

Description: 1

Figure #6: Using Gantt chart with report time.  Color shows details about ‘StatusValue’ as an attribute. The ‘StatusValue’ filter ranges from 1 to 4 (from ‘good’ to ‘no status’). Besides, a 3-day-gap shows as no color (white) area which could demonstrate Event #6.

 

 

Event #6: A three-day ‘gap’ for Network Health and Status Data (Big Brother)

Start: 4/7/2013 9:01:36 AM

End: 4/10/2013 6:58:41 AM

Description: For all message types and machine type, there was not any record from the time the event happened. Figures #6 and Figure #2 about Big Brother data set shows the problem.

 

 

Event #7: Workstation ‘Num_Info’ spike in IPS data

Start: [1] April 10, 2013 7 AM, [2] April 10, 2013 5 PM, [3] April 12, 2013 6 AM, [4] April 13, 2013 10 PM, [5] April 15, 2013 5 AM

End: [2] April 10, 2013 9 AM, [2] April 11, 2013 8 AM, [3] April 12, 2013 9 AM, [4] April 14, 2013 9 AM, [5] April 15, 2013 8 AM

Description: A few rises of ‘Num_Info’ message priority in workstations’ IPS information. Reasons are mainly detected from a few workstations.

Image:

Description: 8_1

Figure #7: The area plot of sum of the ‘Num_Info’ for the ‘DateTime’ hour.  Colors show details about different subnets.

 

Description: 8_2

Figure #8: Tree maps show workstations from different subnets in terms of the ‘Num_info  and the ‘DateTime’ hour. In the tree maps, colors show details about different enterprise sites (e.g. ‘wss1’ means ‘BigMkt1 ’).  Sizes show the sum of the ‘Num_Info’ count.  For both roles of source and destination IPs, some workstations are notable such as IP 172.10.2.106 .etc.

 

 

Event #8: Workstations did not send ‘mem’ and ‘disk’ status

Start: 4/10/2013 7:08:02 AM

End: 4/15/2013 9:25:48 AM

Description: Mostly, workstations did not send both ‘mem’ and ‘disk’ status information after 4/10/2013 7:08:02 AM (Big Brother Gap finished). Also, from the view of subnet #2, unidentified workstations were existing to the end of the 2 weeks. Another finding is that, for each workstation machine, the behavior of reporting Big Brother ‘disk’ Status is similar to reporting Big Brother ‘mem’ status.

Image:

Description: 5

Figure #9: Gantt Charts are about workstations’ status information. Plots are separated by message type (left: ‘disk’, right: ‘mem’).  Color shows details about the ‘StatusValue’ as an attribute. In the dataset, the ‘StatusValue’ only reports as 1 or 4.

 

 

Event #9: Server ‘Num_Info’ Spike in IPS data

Start: April 11, 2013 10 AM

End: April 11, 2013 1 PM

Description: A sudden rise of ‘Num_Info’ message priority in servers’ IPS information. The situation was caused by a few servers.

Image:

Description: 7

Figure #10: The area plot of sum of the ‘Num_Info’ for the ‘DateTime’ hour. Colors show details about servers within ‘Src or ‘Dst’.

 

Description: 7_1

Figure #11: Tree maps show Src_server and Dst_server’sNum_info,’ and ‘DateTime’ hour. In the tree maps, Colors represent different servers and sizes show the sum of ‘Num_Info’ count.  For both roles of source and destination IP, server ‘web03’, ‘web01’, ‘web02l’ and ‘web02’ dominated the maps.

 

 

 

 

 

MC3.2 – Speculate on one or more narratives that describe the events on the network. Provide a list of analytic hypotheses and/or unanswered questions about the notable events. In other words, if you were to hand off your timeline to an analyst who will conduct further investigation, what confirmations and/or answers would you like to see in their report back to you? Your answer should be no more than 300 words long and may contain up to three additional images.

 

About Event #5:

Analytic hypotheses:

1.       Server ‘web03’ could be compromised posibily by DDoS attacks during the event happened period.

2.       Server ‘web03’ was intruded around April 3, 2013 10 AM which caused its functionality break down.

3.       Intrusion kept in process at April 11 and April 14.

Description: 10_3

Figure #12: High density of network connections happened for server ‘web03’ four times. If it was the regular payload rising, it should be in an order like Figure #13.

 

Description: 9_2

Figure #13: Using area chart to plot workstations network connection changing situation based on network flow data set. The spike happened every day (except when event #6 happened) and the top of every peak is at around 7AM which could be interpreted as a morning ‘network traffic jam’.

 

4.       Server ‘web01’, ‘web02 and ‘web02L’ have undergone a similar type of intrusion.

Description: 10_4

Figure #14: Gantt chart based on servers’ IPS information which could support the hypothesis of intrusion for servers.  Colors represent the sum of ‘Num_Info’ count. The higher value shows in deep green color for server ‘web01’, ‘web02 and ‘web02L’ at the same time around April 11, 2013 12PM.

 

 

 

 

 

MC3.3 – Describe the role that your visual analytics played in enabling discovery of the notable events in MC3.1. Describe whether your visual analytics play a role in formulating the questions in MC3.2. Your answer should be no more than 300 words long and may contain up to three additional images.

 

The colored Gantt chart could put not only continuous information but discrete data into a time line with additional dimension. The dynamics of value changing and density could be easily investigated. When quantity of items become large, we use a time-info tree map for better scalibility to highlight the most dominating items within a selected element which also could be any type of attribute. Combining these two visual analytics with easy-to-perceive visualization solutions such as regular bar charts, circle charts or area charts in terms of the sum level could speed up the discovery of anomalies in large data. Our visual analytics utilize different visual techniques at a variety of observing granularity levels to detect interesting events. The generality and reproductivity of the visualization methods make it relatively practical for daily network security administration.